The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme1
نویسنده
چکیده
This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM. The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficient full-text search on annotated corpora and for statistical data analysis. The architecture is based on a Java toolbox articulating a full-text search engine component with a statistical computing environment and with an original import environment able to process a large variety of data sources, including XML-TEI, and to apply embedded NLP tools to them. The platform is distributed as an open-source Eclipse project for developers and in the form of two demonstrator applications for end users: a standard application to install on a workstation and an online web application framework.
منابع مشابه
The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme
This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM. The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficien...
متن کاملPorting LooCI from the Contiki Platform to the Zigduino Platform : An Working Approach
Abstract—The Zigduino is an open-source Arduino compatible microcontroller platform with an integrated 802.15.4 radio. The Loosely-coupled Component Infrastructure (LooCI) is a component-based middleware for building sensor network applications that runs on the Contiki operating system, which provides IPv6 networking. In this paper, we describe our approach to, and experiences of porting the Lo...
متن کاملThe Excitement Open Platform for Textual Inferences
This paper presents the Excitement Open Platform (EOP), a generic architecture and a comprehensive implementation for textual inference in multiple languages. The platform includes state-of-art algorithms, a large number of knowledge resources, and facilities for experimenting and testing innovative approaches. The EOP is distributed as an open source software.
متن کاملUsing UIMA to Structure An Open Platform for Textual Entailment
EXCITEMENT is a novel, open software platform for Textual Entailment (TE) which uses the UIMA framework. This paper discusses the design considerations regarding the roles of UIMA within EXCITEMENT Open Platform (EOP). We focus on two points: a) how to best design the representation of entailment problems within UIMA CAS and its type system. b) the integration and usage of UIMA components among...
متن کاملBeyond Images: Encoding Music for Access and Retrieval
Libraries have embraced the digital encoding of textual documents for improved access and search-based retrieval but have largely ignored similar possibilities regarding the digital encoding of Symbolic Music Representation (SMR) for traditionally notated Western music. This is likely due in part to not only general unawareness by librarians of the prevailing types of Digital SMR formats, but a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010